Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: show debug options on --verbose #9693

Merged
merged 10 commits into from
Jul 3, 2023
Merged

Conversation

Erotemic
Copy link
Contributor

@Erotemic Erotemic commented Jul 3, 2023

Digging through the code I saw there were a lot of builtin tools for profiling / debugging, but (as far as I can tell) there was no mention of these in the docs, nor was there wany way for a user to become aware of them without digging into the code.

To remedy this, I added an environment variable DVC_SHOW_DEBUG_OPTIONS (which perhaps could be shorted to DVC_DEBUG), and if this is specified the hidden CLI options will be shown to the user. Otherwise the behavior is the same.

Also, when I tried to run a command with --show-stack I didn't see any difference in the output. So I'm not sure what it's actually doing.

I've added a section in the contributing docs which mentions this new option. The corresponding PR is here: iterative/dvc.org#4662

@skshetry
Copy link
Member

skshetry commented Jul 3, 2023

They are documented in wiki, https://github.com/iterative/dvc/wiki/Debugging,-Profiling-and-Benchmarking-DVC, although they may be hard to find.

Also, when I tried to run a command with --show-stack I didn't see any difference in the output. So I'm not sure what it's actually doing.

It only works on macOS and Linux. On macOS, you have to press Ctrl + T and on Linux, you have to press Ctrl + \ (backslash), while dvc is running with --show-stack, and it'll print you a traceback of where the main thread is.

@Erotemic
Copy link
Contributor Author

Erotemic commented Jul 3, 2023

Thanks for the pointer! It probably makes sense to replace the blurb I wrote in the contributing docs with a reference to this wiki. I think having this info in contributing docs will make it a lot easier to find because the people who would need debugging tools (i.e. contributors) should be reading that.

@Erotemic
Copy link
Contributor Author

Erotemic commented Jul 3, 2023

I updated the help strings based on info in the wiki, and I also modified the docs PR to point to the wiki.

New DVC_SHOW_DEBUG_OPTIONS=1 dvc --help output looks like this:

(pyenv3.11.2) joncrall@toothbrush:~/code/dvc$ DVC_SHOW_DEBUG_OPTIONS=1 dvc --help
usage: dvc [--cprofile] [--yappi] [--yappi-separate-threads] [--viztracer] [--viztracer-depth VIZTRACER_DEPTH] [--viztracer-async] [--cprofile-dump CPROFILE_DUMP] [--pdb] [--instrument]
           [--instrument-open] [--show-stack] [-q | -v] [-h] [-V] [--cd <path>]
           COMMAND ...

Data Version Control

options:
  --cprofile            Generate cprofile data for tools like snakeviz / tuna
  --yappi               Generate a callgrind file for use with tools like kcachegrind / qcachegrind
  --yappi-separate-threads
                        Generate one callgrind file per thread
  --viztracer           Generate a viztracer file for use with vizviewer
  --viztracer-depth VIZTRACER_DEPTH
                        Set viztracer maximum stack depth
  --viztracer-async     Treat async tasks as threads
  --cprofile-dump CPROFILE_DUMP
                        Location to dump cprofile file
  --pdb                 Drop into the pdb/ipdb debugger on any exception
  --instrument          Use pyinstrument CLI profiler
  --instrument-open     Use pyinstrument web profiler
  --show-stack, --ss    Use Ctrl+T on macOS or Ctrl+\ on Linux to print the stack frame currently executing. Unavailable on Windows.
  -q, --quiet           Be quiet.
  -v, --verbose         Be verbose.
  -h, --help            Show this help message and exit.
  -V, --version         Show program's version.
  --cd <path>           Change to directory before executing.

Available Commands:
  COMMAND               Use `dvc COMMAND --help` for command-specific help.
    init                Initialize DVC in the current directory.
    queue               Commands to manage experiments queue.
    get                 Download file or directory tracked by DVC or by Git.
    get-url             Download or copy files from URL.
    destroy             Remove DVC files, local DVC config and data cache.
    add                 Track data files or directories with DVC.
    remove              Remove stages from dvc.yaml and/or stop tracking files or directories.
    move                Rename or move a DVC controlled data file or a directory.
    unprotect           Unprotect tracked files or directories (when hardlinks or symlinks have been enabled with `dvc config cache.type`).
    repro               Reproduce complete or partial pipelines by executing their stages.
    pull                Download tracked files or directories from remote storage.
    push                Upload tracked files or directories to remote storage.
    fetch               Download files or directories from remote storage to the cache.
    status              Show changed stages, compare local cache and a remote storage.
    gc                  Garbage collect unused objects from cache or remote storage.
    import              Download file or directory tracked by DVC or by Git into the workspace, and track it.
    import-url          Download or copy file from URL and take it under DVC control.
    config              Get or set config options.
    checkout            Checkout data files from cache.
    remote              Set up and manage data remotes.
    cache               Manage cache settings.
    metrics             Commands to display and compare metrics.
    params              Commands to display params.
    install             Install DVC git hooks into the repository.
    root                Return the relative path to the root of the DVC project.
    list (ls)           List repository contents, including files and directories tracked by DVC and by Git.
    list-url (ls-url)   List directory contents from URL.
    freeze              Freeze stages or .dvc files.
    unfreeze            Unfreeze stages or .dvc files.
    dag                 Visualize DVC project DAG.
    commit              Record changes to files or directories tracked by DVC by storing the current versions in the cache.
    completion          Generate shell tab completion.
    diff                Show added, modified, or deleted data between commits in the DVC repository, or between a commit and the workspace.
    version (doctor)    Display the DVC version and system/environment information.
    update              Update data artifact imported (via dvc import or dvc import-url) from an external DVC repository or URL.
    plots               Commands to visualize and compare plot data.
    stage               Commands to list and create stages.
    experiments (exp)   Commands to run and compare experiments.
    check-ignore        Check whether files or directories are excluded due to `.dvcignore`.

dvc/_debug.py Outdated
parser.add_argument("--yappi", action="store_true", default=False, help=SUPPRESS)
# For detailed info see:
# https://github.com/iterative/dvc/wiki/Debugging,-Profiling-and-Benchmarking-DVC
dvc_show_debug_options = os.environ.get("DVC_SHOW_DEBUG_OPTIONS", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit too verbose. Maybe better to use --debug flag? Do you know maybe some examples of how other projects handle this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long term, I'd really like to introduce the concept of "extensions", even if it's for our own use. We could enable extra functionalities if the devel extension is enabled, for example, in the config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to opt for verbosity when I'm unsure. I think DVC_DEBUG=1 <cmd> would likely work as well.

I don't see an easy way to add a --debug option because the argparse object hasn't finished being constructed at the time where we need to determine if we use SUPRESS or add a help message. I think the environment variable approach is the cleanest way to allow these options to be conditionally exposed in --help.

The other options I can think of are:

  • Add a --debug flag which prints a special help section, but that would need to be maintained separately, so that's undesirable.

  • Check if "--debug" in sys.argv, but that's undesirable because 1. the parser might not be applied to sys.argv, and 2. there is a chance --debug might appear on the command line, but not be intended as an argument flag (e.g. dvc add -- --debug). Checking sys.argv won't respect the -- like argparse will. These issues probably won't arise in practice, and even if they did it would likely not matter because it just changes the content of the help message. Still, I think it is unclean design and the environment variable approach seems like the "idomatic" thing to do here. That being said, "--debug" in sys.argv is simple, and I'm open to doing it this way. The only thing I think is important is that there is a way for a developer to unhide these options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Seems like a total overkill for this purpose. But I'm not even sure what extensions you mean, I suppose like git extensions? Don't quite see the need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Erotemic DVC_DEBUG sounds fine to me, let's not overthink it.

Copy link
Member

@skshetry skshetry Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at mercurial, they show the options only when --verbose is set including debugging related options.

Copy link
Contributor

@efiop efiop Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Erotemic regarding the cli option, there is a way to partially parse known options in argparse IIRC. Just checking argv (not sys.argv) is also fine. Mercurial's --verbose sounds pretty good too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to DVC_DEBUG and added it to env.

I do like the idea of showing the extra help with vebose is set. At first I wasn't sure how to do it, but I think its possible as long as the parser is initialized with add_help=False (which it is) and if we can ensure the verbose argument is added to the parser before we call add_debugging_flags, which should just require reordering the code in get_parent_parser.

Now that I see how to do this, I think it would be a good idea to remove the DVC_DEBUG environment variable entirely, and just add the extra help options is -v is already given (which will make it much easier for a developer to stumble on this too, which is good).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the above change, I can revert if you want to go with the environ, but I think just respecting -v is much cleaner now that I know about the add_help argment to ArgumentParser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Erotemic and @skshetry 🙏

@Erotemic
Copy link
Contributor Author

Erotemic commented Jul 3, 2023

While playing with this I noticed two things and made changes accordingly. First the "cprofile-dump" option was not near "cprofile", so I moved it to keep similar options grouped. Secondly, if you run DVC_SHOW_DEBUG_OPTIONS=1 dvc add --help the debug options were intersperced with the add options, so I added an argument group such that they are always displayed separately. E.g.

(pyenv3.11.2) joncrall@toothbrush:~/remote/toothbrush/data/dvc-repos/smart_data_dvc-ssd/Drop7-Cropped2GSD/CH_R001/CH_R001_CLUSTER_041$ DVC_SHOW_DEBUG_OPTIONS=1 dvc add --help
usage: dvc add [-h] [--cprofile] [--cprofile-dump CPROFILE_DUMP] [--yappi] [--yappi-separate-threads] [--viztracer] [--viztracer-depth VIZTRACER_DEPTH] [--viztracer-async] [--pdb]
               [--instrument] [--instrument-open] [--show-stack] [-q | -v] [--no-commit] [--glob] [-o <path>] [--to-remote] [-r <name>] [--remote-jobs <number>] [-f]
               targets [targets ...]

Track data files or directories with DVC.
Documentation: <https://man.dvc.org/add>

positional arguments:
  targets               Input files/directories to add.

options:
  -h, --help            show this help message and exit
  -q, --quiet           Be quiet.
  -v, --verbose         Be verbose.
  --no-commit           Don't put files/directories into cache.
  --glob                Allows targets containing shell-style wildcards.
  -o <path>, --out <path>
                        Destination path to put files to.
  --to-remote           Download it directly to the remote
  -r <name>, --remote <name>
                        Remote storage to download to
  --remote-jobs <number>
                        Only used along with '--to-remote'. Number of jobs to run simultaneously when pushing data to remote.The default value is 4 * cpu_count().
  -f, --force           Override local file or folder if exists.

debug options:
  --cprofile            Generate cprofile data for tools like snakeviz / tuna
  --cprofile-dump CPROFILE_DUMP
                        Location to dump cprofile file
  --yappi               Generate a callgrind file for use with tools like kcachegrind / qcachegrind
  --yappi-separate-threads
                        Generate one callgrind file per thread
  --viztracer           Generate a viztracer file for use with vizviewer
  --viztracer-depth VIZTRACER_DEPTH
                        Set viztracer maximum stack depth
  --viztracer-async     Treat async tasks as threads
  --pdb                 Drop into the pdb/ipdb debugger on any exception
  --instrument          Use pyinstrument CLI profiler
  --instrument-open     Use pyinstrument web profiler
  --show-stack, --ss    Use Ctrl+T on macOS or Ctrl+\ on Linux to print the stack frame currently executing. Unavailable on Windows.

@codecov
Copy link

codecov bot commented Jul 3, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.02 🎉

Comparison is base (58eb145) 90.53% compared to head (29fb4d4) 90.55%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9693      +/-   ##
==========================================
+ Coverage   90.53%   90.55%   +0.02%     
==========================================
  Files         480      480              
  Lines       36382    36392      +10     
  Branches     5101     5228     +127     
==========================================
+ Hits        32938    32955      +17     
+ Misses       2852     2849       -3     
+ Partials      592      588       -4     
Impacted Files Coverage Δ
dvc/_debug.py 28.66% <100.00%> (+3.49%) ⬆️
dvc/cli/parser.py 100.00% <100.00%> (ø)

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

dvc/_debug.py Outdated Show resolved Hide resolved
@efiop efiop merged commit 541def3 into iterative:main Jul 3, 2023
20 checks passed
@efiop efiop added the A: debug Related to dvc debugging tools label Jul 3, 2023
@skshetry skshetry changed the title environment variable to show debug options cli: show debug options on --verbose Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: debug Related to dvc debugging tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants